## [1] 1599 12
## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "free.sulfur.dioxide"
## [7] "total.sulfur.dioxide" "density" "pH"
## [10] "sulphates" "alcohol" "quality"
## Observations: 1,599
## Variables: 12
## $ fixed.acidity <dbl> 7.4, 7.8, 7.8, 11.2, 7.4, 7.4, 7.9, 7.3, 7.…
## $ volatile.acidity <dbl> 0.700, 0.880, 0.760, 0.280, 0.700, 0.660, 0…
## $ citric.acid <dbl> 0.00, 0.00, 0.04, 0.56, 0.00, 0.00, 0.06, 0…
## $ residual.sugar <dbl> 1.9, 2.6, 2.3, 1.9, 1.9, 1.8, 1.6, 1.2, 2.0…
## $ chlorides <dbl> 0.076, 0.098, 0.092, 0.075, 0.076, 0.075, 0…
## $ free.sulfur.dioxide <dbl> 11, 25, 15, 17, 11, 13, 15, 15, 9, 17, 15, …
## $ total.sulfur.dioxide <dbl> 34, 67, 54, 60, 34, 40, 59, 21, 18, 102, 65…
## $ density <dbl> 0.9978, 0.9968, 0.9970, 0.9980, 0.9978, 0.9…
## $ pH <dbl> 3.51, 3.20, 3.26, 3.16, 3.51, 3.51, 3.30, 3…
## $ sulphates <dbl> 0.56, 0.68, 0.65, 0.58, 0.56, 0.56, 0.46, 0…
## $ alcohol <dbl> 9.4, 9.8, 9.8, 9.8, 9.4, 9.4, 9.4, 10.0, 9.…
## $ quality <int> 5, 5, 5, 6, 5, 5, 5, 7, 7, 5, 5, 5, 5, 5, 5…
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.636
## 3rd Qu.:6.000
## Max. :8.000
1.dataset consists of 12 row and 1599 colomns.
2.range for fixed acidity is quite high with minimum of 4.60 and maximum 15.90.
3.range for quality is btween 8 and 3 with mean at 5.6.
4.range for PH from 2.740 to 4.010 with a median of 3.310.
5.range for alcohol is btween 14 and 8 with mean at 10.42.
Here it appears that the spread for the quality for Red wine data set seems to normal distribution and most of the wines have a quality rating of 5 or 6.
The distribution of fixed acidity is right skewed, and most of it between 6 and 7.
The distribution of volatile.acidity is right skewed,and most of volatile.acidity between 0.4 and 0.6.
The distribution of citric.acid is right skewed,and most of citric.acid between .25 and .50. the value at 0 is outlier
The distribution of residual sugar is also right skewed. and most of it around 2.
The distribution of chlorides right skewed. and most of it around 0.09.
The distribution of free.sulfur.dioxide right skewed. and most of it around 7.
The distribution of total.sulfur.dioxide right skewed. and most of it around 21.
The distribution of density is normal distribution . and most of it almost 1.
The distribution of pH is normal distribution . and most of it at 3.
The distribution of sulphates is right skewed . and most of it at 0.55.
The distribution of alcohol is right skewed . and most of it btween between 9.4 and 9.6.
In this dataset, there are 1.599 wines with 13 variables, all variables are numerical , one of them is quality which is variables and 11 of them are chemical properties (fixed acidity, volatile acidity, citric acid, residual sugar, chlorides,free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol).
The main features of interest dose this chemical properties affect on red wine quality or not ?
In my opinion chemical properties( alcohol , citric acid ,Fixed acidity,Volatile acidity ) may affect the quality of the wine.
I haven’t create any new features so far.
The dataset was already tidy and did not need to change its format.
##
## Pearson's product-moment correlation
##
## data: redwine$alcohol and redwine$quality
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4373540 0.5132081
## sample estimates:
## cor
## 0.4761663
There is a slight positive correlation between alcohol and quality. when the alcohol increases the quality increases too.
##
## Pearson's product-moment correlation
##
## data: redwine$citric.acid and redwine$quality
## t = 9.2875, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1793415 0.2723711
## sample estimates:
## cor
## 0.2263725
There is low correlation between citric acid and quality.
##
## Pearson's product-moment correlation
##
## data: redwine$fixed.acidity and redwine$quality
## t = 4.996, df = 1597, p-value = 6.496e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.07548957 0.17202667
## sample estimates:
## cor
## 0.1240516
There is low correlation between fixed acidity and quality.
##
## Pearson's product-moment correlation
##
## data: redwine$volatile.acidity and redwine$quality
## t = -16.954, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4313210 -0.3482032
## sample estimates:
## cor
## -0.3905578
There is a negative correlation between volatile acidity and quality.
##
## Pearson's product-moment correlation
##
## data: redwine$alcohol and redwine$density
## t = -22.838, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5322547 -0.4583061
## sample estimates:
## cor
## -0.4961798
As we can see at the graph there are a strong correlation between Alcohol and density
Alcohol affect on redwine quality and density , since it has a strong correlation with them While alcohol increases the quality, it also decreases the density.
Alcohol: Positive correlation with quality negative Correlation with density.
Quality: Positive correlation with alcohol negative correaltion with volatile acidity.
Fixed.acidity : Positive strong correlation with citric.acid and negative correalation with pH.
The strongest relationship is between Alcohol and Quality.
It appears on the graph that the density of the wine does not have much effect on the quality. While alcohol dose effect.
It appears that fixed acidity causes pH levels to decrease as long as increases the Quality.
I observed that when the there are more alcohol and fixed acidity at wine , it increses its quality. while density alone dose not have much effect on wine quality
While the wine have less pH and more fixed acidity it cuses it to be a high quality wine.
Quality is the main interest on my exploration , we can see that most of the wine has a normal quality.
From the graph there is a strong correlation between alcohol and density , While as we see In previous graph the density doesn’t have much effect on wine quality even though alcohol dose , when alcohol increase it increases wine quality.
This three chemical properties is about acidity in wine , but as its shown in the gragh , there are a low positive correlation between citric acid and fixed acidity with quality of wine which means that they may have a slight effect on wine quality , on the other hand , the volatile acidity have a negative correlation with wine quality which means it has a negative effect on wine quality.
This Dataset consist of 1,599 observations with 11 chemical proprieties , my first move is to understand the dataset and what its contain , then the main interest was “ Dose the chemical proprieties of wine effect its quality or no “ ? The struggles that its my first time using R to explore and analysis , so i faced difficulty to use a proper code on each insight , although there were some small deatils that i should learn about R to get this project done , so i learned the basics from Udacity lectures and uses some of youtube to understand R. Also , the surprising thing that i don’t know have a great background about wine propriety , i thought it is all about alcohol .. after i have done this project i can say that i have enough information about red wine.
In the future work , i wish that there were different kind of wine , like white wine to compare with red wine and spicify which one have a great quality.